AITopics | observable mdp

Collaborating Authors

observable mdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Improved Policy Iteration Algorithm for Partially Observable MDPs

Neural Information Processing SystemsApr-6-2023, 18:01:09 GMT

A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The pa(cid:173) per's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the trans(cid:173) formation of a finite-state controller into an improved finite-state con(cid:173) troller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.

finite-state controller, improved policy iteration algorithm, observable mdp, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Bootstrapping LPs in Value Iteration for Multi-Objective and Partially Observable MDPs

Roijers, Diederik M. (Vrije Universiteit Brussel, Vrije Universiteit Amsterdam) | Walraven, Erwin (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)

AAAI ConferencesJun-20-2018

Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning.

artificial intelligence, bootstrapping lp, machine learning, (2 more...)

AAAI Conferences

Twenty-Eighth International Conference on Automated Planning and Scheduling

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Learning Policies in Partially Observable MDPs with Abstract Actions Using Value Iteration

Janzadeh, Hamed (The University of Texas at Arlington) | Huber, Manfred (The University of Texas at Arlington)

AAAI ConferencesMay-19-2013

While the use of abstraction and its benefit in terms of transferring learned information to new tasks has been studied extensively and successfully in MDPs, it has not been studied in the context of Partially Observable MDPs. This paper addresses the problem of transferring skills from previous experiences in POMDP models using high-level actions (options). It shows that the optimal value function remains piecewise-linear and convex when policies are high-level actions, and shows how value iteration algorithms can be modified to support options. The results can be applied to all existing value Iteration algorithms. Experiments show how adding options can speed up the learning process.

abstract action, learning policy, value iteration, (1 more...)

AAAI Conferences

The Twenty-Sixth International FLAIRS Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback